Submit a single zip-compressed file that has the name: YourLastName_Assignment_5 that has the following files:
In this assignment, you will:
Create a folder or directory with all supplementary files with your last name at the beginning of the folder name, compress that folder with zip compression, and post the zip-archived folder under the assignment link in Canvas. The following files should be included in an archive folder/directory that is uploaded as a single zip-compressed file. (Use zip, not StuffIt or any 7z or any other compression method.)
Formatting Python Code When programming in Python, refer to Kenneth Reitz’ PEP 8: The Style Guide for Python Code: http://pep8.org/ (Links to an external site.)Links to an external site. There is the Google style guide for Python at https://google.github.io/styleguide/pyguide.html (Links to an external site.)Links to an external site. Comment often and in detail.
Recent watchdog report published by Chicago Tribune indcated that food safety inspectors overlook hundreds of day cares in the city of Chicago.
The key take away from the Chicago Tribune watchdog report is that the city had only 33 working field inspectors to cover the entire city of Chicago. Many of the facilties serve food for Children, and while few fail inspectionns, many escape routine inspections.
This is a classic resource allocation problem. In this assignment, our goal is to identify the hot-spots (areas that have facilities serving food to children and have failed inspections in the past) on the Chicago map to dispatch inspectors to.
To achive our goal, we need the following:
The CSV file for dataset of the city of chicago is obtained from the data portal for the city of Chicago. Here th elink for the city of Chicago data portal City of Chicago Data Portal
Lets load the CSV file into a DataFrame object and see the nature of the data that we have.
Description of the dataset:
# Lets load the CSV Chicago Food Inspections dataset into a dataframe
import pandas as pd
df = pd.read_csv("Chicago_Food_Inspections.csv")
df.head()
| Inspection ID | DBA Name | AKA Name | License # | Facility Type | Risk | Address | City | State | Zip | Inspection Date | Inspection Type | Results | Violations | Latitude | Longitude | Location | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2144807 | SAMMY'S RED HOT | SAMMY'S RED HOT | 2578852.0 | Restaurant | Risk 1 (High) | 238 W DIVISION ST | CHICAGO | IL | 60610.0 | 02/09/2018 | License | Pass | 27. TOILET ROOMS ENCLOSED CLEAN, PROVIDED WITH... | 41.903938 | -87.636215 | (41.90393760463335, -87.63621470016516) |
| 1 | 2144802 | CAFE NILLY | NILLY CAFE | 2578631.0 | Restaurant | Risk 1 (High) | 60 E ADAMS ST | CHICAGO | IL | 60603.0 | 02/08/2018 | License | Pass | 32. FOOD AND NON-FOOD CONTACT SURFACES PROPERL... | 41.879646 | -87.625485 | (41.879645907113094, -87.6254853513935) |
| 2 | 2144800 | EVITA ARGENTINIAN STEAKHOUSE | EVITA | 2464488.0 | Restaurant | Risk 1 (High) | 6112 N LINCOLN AVE | CHICAGO | IL | 60659.0 | 02/08/2018 | Canvass | Pass | 31. CLEAN MULTI-USE UTENSILS AND SINGLE SERVIC... | 41.993151 | -87.712099 | (41.993151143293765, -87.71209940511191) |
| 3 | 2144791 | PHO SPICIER THAI | PHO SPICIER THAI | 2578881.0 | NaN | All | 1320 W DEVON AVE | CHICAGO | IL | 60660.0 | 02/08/2018 | License | Not Ready | NaN | 41.998261 | -87.663944 | (41.99826122453937, -87.6639441429001) |
| 4 | 2144789 | RED SNAPPER | JIMMY'S BEST | 2232836.0 | Restaurant | Risk 1 (High) | 1347 E 87TH ST | CHICAGO | IL | 60619.0 | 02/08/2018 | Complaint | Fail | 19. OUTSIDE GARBAGE WASTE GREASE AND STORAGE A... | 41.736731 | -87.591144 | (41.736730601766894, -87.59114387212324) |
There are few fields in the dataset of interest for us:
We are also interested in any field that mentioned (or misspelled) the word Children
There are possibilities that the data entry clerk might've made some typos and misspellings and there are different words meant to indicate the same thing, some examples of this:
To perform different queries to retrieve the relevant inspection records, we will store the dataset in a NoSQL database engine ElasticSearch.
For more information on elastic search visit ElasticSearch
The three major platofrms are supported:
Startup ElasticSearch Server
After you install ElasticSearch, go to the directory where you installed ElasticSearch under elasticsearch-6.2.3\bin directory and type from the terminal/command prompt the following command: elasticsearch
We need elasticsearch package to connect to ElasticSearch Servers
To install elastic search pakage, execute following command from the command/terminal windows:
#Import Elascticsearch and helpers from elasticsearch
from elasticsearch import Elasticsearch, helpers
es=Elasticsearch('http://student:spsdata@129.105.248.25:9200')
Inspection records are insreted into ElasticSearch engine using the bulk Api of elastic search.
Here is the link API DOCS for the API documentation.
The query is used to retrieve data from ElasticSearch servers that match certain filters.
For information about the syntax and semantics for query, you can read the docs at the following URL QUERY DOCS
We will also use the scroll to retrive the data matching the our query. For more information about scroll, you can read the docs ta the following URL Scroll DOCS
We create our query to rertieve the inspections records we are interested in three experiements and will compare the results for each:
query = {
'size' : 10000,
'query': {
'bool': {
'must' : [{'match' : {'Results': 'Fail'}}, {"match" : {'Risk': {"query": 'Risk 1 (High)', "operator": "and"}} }, # same as where clasue in SQL
{"query_string": {
"query": "*Children*", #using regex of children to match all posssible combinations of "Children"
"fields": ["Facility Type","Violations","DBA Name"] #Multi-field matching query
}
}
]
}
}
}
results = es.search(index='food_inspections', body=query, scroll='1h')
sid = results['_scroll_id']
scroll_size = results['hits']['total']
print('sid = ', sid)
print('Scroll Size = ', scroll_size)
sid = DnF1ZXJ5VGhlbkZldGNoCgAAAAABTkpUFmxSTjR3TFBsUXotQ2ZRQ0toYzlGQ0EAAAAAAU5KWRZsUk40d0xQbFF6LUNmUUNLaGM5RkNBAAAAAAFOSlEWbFJONHdMUGxRei1DZlFDS2hjOUZDQQAAAAABTkpTFmxSTjR3TFBsUXotQ2ZRQ0toYzlGQ0EAAAAAAU5KUhZsUk40d0xQbFF6LUNmUUNLaGM5RkNBAAAAAAFOSlUWbFJONHdMUGxRei1DZlFDS2hjOUZDQQAAAAABTkpWFmxSTjR3TFBsUXotQ2ZRQ0toYzlGQ0EAAAAAAU5KVxZsUk40d0xQbFF6LUNmUUNLaGM5RkNBAAAAAAFOSlgWbFJONHdMUGxRei1DZlFDS2hjOUZDQQAAAAABTkpaFmxSTjR3TFBsUXotQ2ZRQ0toYzlGQ0E= Scroll Size = 601
type(results)
dict
# results
We need to create a list-of-lists of the two fields, (Latitude and Longitude) for the HeatMap
# results['hits']['hits']
len(results['hits']['hits'])
601
count = 0
list_of_lAT_LONG_pairs = []
while(scroll_size > 0):
for inspection in results['hits']['hits']: #Iterating each results of the qurey
current_location_lAT_LONG = []
document = inspection['_source']
count = count +1
#defensive coding to ensure we have the fields in the inspection documents
if 'Latitude' in document.keys():
if 'Longitude' in document.keys():
if 'Address' in document.keys():
if(document['Latitude'] != None and document['Longitude'] != None and document['Address'] != None):
current_location_lAT_LONG.append(float(document['Latitude'])) #Appending Latitude and Longitude into the list
current_location_lAT_LONG.append(float(document['Longitude']))
list_of_lAT_LONG_pairs.append(current_location_lAT_LONG)
results = es.scroll(scroll_id = sid, scroll = '2m')
sid = results['_scroll_id'] #Changing the scroll-id
scroll_size = len(results['hits']['hits'])
print("the total number of match with children using wild card:",count)
the total number of match with children using wild card: 601
document.keys()
dict_keys(['Inspection ID', 'DBA Name', 'AKA Name', 'License #', 'Facility Type', 'Risk', 'Address', 'City', 'State', 'Zip', 'Inspection Date', 'Inspection Type', 'Results', 'Violations', 'Latitude', 'Longitude', 'Location'])
list_of_lAT_LONG_pairs[:3]
[[41.8814369069, -87.6659213595], [41.760441801, -87.6735652436], [41.9531127244, -87.7800185741]]
len(list_of_lAT_LONG_pairs)
601
import folium
from folium import plugins
print(folium.__version__)
0.12.1
chicago_map = folium.Map([41.90293279, -87.70769386], zoom_start=11)
chicago_map
# Lets plot the query matches on Chicago HeatMap
chicago_map.add_child(plugins.HeatMap(list_of_lAT_LONG_pairs, radius=15))
chicago_map
Now lets try to retrieve documents using ElasticSearch fuzziness
The fuzzy query generates all possible matching terms that are within the maximum edit distance specified in fuzziness.
For information about the syntax and semantics for fuziness, you can read the docs at the following URL fuzziness
query = {
'size' : 10000,
'query': {
'bool': {
'must' : [{'match' : {'Results': 'Fail'}}, {"match" : {'Risk': {"query": 'Risk 1 (High)', "operator": "and"}} }, # same as where clasue in SQL
{"query_string": {
"query": "Children",
"fuzziness": "1",
"fields": ["Facility Type","Violations","DBA Name"]
}
}
]
}
}
}
results = es.search(index='food_inspections', body=query,scroll='1h')
sid = results['_scroll_id']
scroll_size = results['hits']['total']
count = 0
list_of_lAT_LONG_pairs = []
while(scroll_size > 0):
for inspection in results['hits']['hits']:
current_location_lAT_LONG = []
document = inspection['_source']
count = count +1
#defensive coding to ensure we have the fields in the inspection documents
if 'Latitude' in document.keys():
if 'Longitude' in document.keys():
if 'Address' in document.keys():
if(document['Latitude'] != None and document['Longitude'] != None and document['Address'] != None):
current_location_lAT_LONG.append(float(document['Latitude']))
current_location_lAT_LONG.append(float(document['Longitude']))
list_of_lAT_LONG_pairs.append(current_location_lAT_LONG)
results = es.scroll(scroll_id = sid, scroll = '2m')
sid = results['_scroll_id']
scroll_size = len(results['hits']['hits'])
print("Total number of query matches with children using fuziness:",count)
Total number of query matches with children using fuziness: 141
query = {
'size' : 10000,
'query': {
'bool': {
'must' : [{'match' : {'Results': 'Fail'}}, {"match" : {'Risk': {"query": 'Risk 1 (High)', "operator": "and"}} }, # same as where clasue in SQL
{"query_string": {
"query": "Children's",
"fuzziness": "1",
"fields": ["Facility Type","Violations","DBA Name"]
}
}
]
}
}
}
results = es.search(index='food_inspections', body=query,scroll='1h')
sid = results['_scroll_id']
scroll_size = results['hits']['total']
count = 0
list_of_lAT_LONG_pairs = []
while(scroll_size > 0):
for inspection in results['hits']['hits']:
current_location_lAT_LONG = []
document = inspection['_source']
count = count +1
#defensive coding to ensure we have the fields in the inspection documents
if 'Latitude' in document.keys():
if 'Longitude' in document.keys():
if 'Address' in document.keys():
if(document['Latitude'] != None and document['Longitude'] != None and document['Address'] != None):
current_location_lAT_LONG.append(float(document['Latitude']))
current_location_lAT_LONG.append(float(document['Longitude']))
list_of_lAT_LONG_pairs.append(current_location_lAT_LONG)
results = es.scroll(scroll_id = sid, scroll = '2m')
sid = results['_scroll_id']
scroll_size = len(results['hits']['hits'])
print("Total number of match with Children's using fuziness:",count)
Total number of match with Children's using fuziness: 451
chicago_map = folium.Map([41.90293279, -87.70769386], zoom_start=11)
chicago_map
# Lets plot the query matches for "Children's" on Chicago HeatMap
chicago_map.add_child(plugins.HeatMap(list_of_lAT_LONG_pairs, radius=15))
chicago_map
Despite the fact that the city of Chicago has the department of Business Affairs and Consumer Protection to revoke business licensses to protect consumers, it appears many businesses with frequent violations have obtained new licenses under the same DBA name
Facilities that serve children can be classified under different Facility Types:
We will use ELasticSearch and Folium to plot on the map those facilities that failed inspection at least 5 times with risk high.
query ={
'size' : 10000,
'query': {
"bool" : {
"should":[ {'match' : {'Facility Type': {"query" : 'Daycare (2 - 6 Years)',"operator":"and"}}},
{'match' : {'Facility Type':{"query" : 'Daycare Above and Under 2 Years',"operator": "and"}}},
{'match' : {'Facility Type':{"query" : 'CHILDRENS SERVICES FACILITY',"operator" : "and"}}},
],
"minimum_should_match" : 1,
"filter" : [{"match" : {'Results': {"query": 'Fail', "operator": "and"}}},
{"match" : {'Risk': {"query": 'Risk 1 (High)', "operator": "and"}}}
]
}
},
"aggs" : {
"selected_dbas" :{
"terms" : {
"field" : "DBA Name.keyword",
"min_doc_count": 5,
"size" :10000
},
"aggs": {
"top_dba_hits": {
"top_hits": {
"size": 10
}
}
}
}
}
}
results = es.search(index='food_inspections', body=query,scroll='1h')
# results
list_of_lAT_LONG_pairs = []
for dba_bucket in results["aggregations"]["selected_dbas"]["buckets"]:
if "top_dba_hits" in dba_bucket and "hits" in dba_bucket["top_dba_hits"] and "hits" in dba_bucket["top_dba_hits"]["hits"]:
for hit in dba_bucket["top_dba_hits"]["hits"]["hits"]:
if "_source" in hit:
if "Latitude" in hit["_source"] and "Longitude" in hit["_source"]:
list_of_lAT_LONG_pairs.append([hit["_source"]["Latitude"], hit["_source"]["Longitude"]])
# Lets dumps the LAt and LONG
# list_of_lAT_LONG_pairs
# Lets dump the hits per bucket into a datframe object for all buckets
row_index =0
df_top_frequent_violators = pd.DataFrame()
for dba_bucket in results["aggregations"]["selected_dbas"]["buckets"]:
if "top_dba_hits" in dba_bucket and "hits" in dba_bucket["top_dba_hits"] and "hits" in dba_bucket["top_dba_hits"]["hits"]:
doc_count = dba_bucket['doc_count']
for hit in dba_bucket["top_dba_hits"]["hits"]["hits"]:
score = hit['_score']
if "_source" in hit:
row_index += 1
df_frequent_violator = pd.DataFrame(hit['_source'],index =[row_index])
df_frequent_violator['doc_count'] = doc_count
df_frequent_violator['score'] = score
df_top_frequent_violators = df_top_frequent_violators.append(df_frequent_violator)
df_top_frequent_violators
| Inspection ID | DBA Name | AKA Name | License # | Facility Type | Risk | Address | City | State | Zip | Inspection Date | Inspection Type | Results | Violations | Latitude | Longitude | Location | doc_count | score | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1319663 | BUSY BUMBLE BEE ACADEMY DAYCARE | BUSY BUMBLE BEE ACADEMY DAYCARE | 2215472.0 | Daycare (2 - 6 Years) | Risk 1 (High) | 6450 S COTTAGE GROVE AVE | CHICAGO | IL | 60637.0 | 07/17/2013 | License | Fail | 18. NO EVIDENCE OF RODENT OR INSECT OUTER OPEN... | 41.777092 | -87.606004 | (41.777092394888655, -87.60600377956905) | 9 | 8.258452 |
| 2 | 1229713 | BUSY BUMBLE BEE ACADEMY DAYCARE | BUSY BUMBLE BEE ACADEMY DAYCARE | 3793.0 | Daycare (2 - 6 Years) | Risk 1 (High) | 6450 S COTTAGE GROVE AVE | CHICAGO | IL | 60637.0 | 06/20/2012 | Canvass | Fail | 18. NO EVIDENCE OF RODENT OR INSECT OUTER OPEN... | 41.777092 | -87.606004 | (41.777092394888655, -87.60600377956905) | 9 | 8.199016 |
| 3 | 1515476 | BUSY BUMBLE BEE ACADEMY DAYCARE | BUSY BUMBLE BEE ACADEMY DAYCARE | 2215472.0 | Daycare (2 - 6 Years) | Risk 1 (High) | 6450 S COTTAGE GROVE AVE | CHICAGO | IL | 60637.0 | 12/29/2014 | Complaint | Fail | 18. NO EVIDENCE OF RODENT OR INSECT OUTER OPEN... | 41.777092 | -87.606004 | (41.777092394888655, -87.60600377956905) | 9 | 8.112285 |
| 4 | 1229852 | BUSY BUMBLE BEE ACADEMY DAYCARE | BUSY BUMBLE BEE ACADEMY DAYCARE | 1194190.0 | Daycare (2 - 6 Years) | Risk 1 (High) | 6450 S COTTAGE GROVE AVE | CHICAGO | IL | 60637.0 | 06/28/2012 | Canvass Re-Inspection | Fail | 14. PREVIOUS SERIOUS VIOLATION CORRECTED, 7-42... | 41.777092 | -87.606004 | (41.777092394888655, -87.60600377956905) | 9 | 8.112285 |
| 5 | 1386187 | BUSY BUMBLE BEE ACADEMY DAYCARE | BUSY BUMBLE BEE ACADEMY DAYCARE | 2215472.0 | Daycare (2 - 6 Years) | Risk 1 (High) | 6450 S COTTAGE GROVE AVE | CHICAGO | IL | 60637.0 | 06/08/2015 | License | Fail | 18. NO EVIDENCE OF RODENT OR INSECT OUTER OPEN... | 41.777092 | -87.606004 | (41.777092394888655, -87.60600377956905) | 9 | 8.095845 |
| ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... | ... |
| 147 | 285066 | THE EDSEL ALBERT AMMONS NURSER | THE EDSEL ALBERT AMMONS NURSER | 15803.0 | Daycare (2 - 6 Years) | Risk 1 (High) | 549 E 76TH ST | CHICAGO | IL | 60619.0 | 09/03/2010 | License | Fail | 18. NO EVIDENCE OF RODENT OR INSECT OUTER OPEN... | 41.756551 | -87.610690 | (41.75655095611123, -87.61068980246957) | 5 | 8.215062 |
| 148 | 1235065 | THE EDSEL ALBERT AMMONS NURSER | THE EDSEL ALBERT AMMONS NURSER | 15803.0 | Daycare (2 - 6 Years) | Risk 1 (High) | 549 E 76TH ST | CHICAGO | IL | 60619.0 | 09/06/2012 | License Re-Inspection | Fail | 38. VENTILATION: ROOMS AND EQUIPMENT VENTED AS... | 41.756551 | -87.610690 | (41.75655095611123, -87.61068980246957) | 5 | 8.215062 |
| 149 | 1158446 | THE EDSEL ALBERT AMMONS NURSER | THE EDSEL ALBERT AMMONS NURSER | 15803.0 | Daycare (2 - 6 Years) | Risk 1 (High) | 549 E 76TH ST | CHICAGO | IL | 60619.0 | 09/04/2012 | License Re-Inspection | Fail | 14. PREVIOUS SERIOUS VIOLATION CORRECTED, 7-42... | 41.756551 | -87.610690 | (41.75655095611123, -87.61068980246957) | 5 | 8.112285 |
| 150 | 545232 | THE EDSEL ALBERT AMMONS NURSER | THE EDSEL ALBERT AMMONS NURSER | 15803.0 | Daycare (2 - 6 Years) | Risk 1 (High) | 549 E 76TH ST | CHICAGO | IL | 60619.0 | 02/18/2011 | Canvass | Fail | 18. NO EVIDENCE OF RODENT OR INSECT OUTER OPEN... | 41.756551 | -87.610690 | (41.75655095611123, -87.61068980246957) | 5 | 8.025301 |
| 151 | 1234922 | THE EDSEL ALBERT AMMONS NURSER | THE EDSEL ALBERT AMMONS NURSER | 15803.0 | Daycare (2 - 6 Years) | Risk 1 (High) | 549 E 76TH ST | CHICAGO | IL | 60619.0 | 08/28/2012 | License | Fail | 18. NO EVIDENCE OF RODENT OR INSECT OUTER OPEN... | 41.756551 | -87.610690 | (41.75655095611123, -87.61068980246957) | 5 | 8.024600 |
151 rows × 19 columns
# Lets print the number of violations for every DBA NAME
df_top_frequent_violators['DBA Name'].value_counts()
BUSY BUMBLE BEE ACADEMY DAYCARE 9 BOTTLES TO BOOKS LEARNING CENTER 8 AMAZING GRACE DAYCARE CENTER 7 A CHILD'S WORLD EARLY LEARNING CENTER 7 COMMONWEALTH DAYCARE CENTER 6 EARLY CHILDHOOD EDUCARE CENTER 6 KIDS R FIRST LEARNING ACADEMY 6 FIRMAN COMMUNITY SERVICES 6 Little People's Day Care & Kindergarten, Inc. 6 THE WORLD IS YOUR'S CHILD CARE & LEARNING CENTER INC. 6 LITTLE KIDS VILLAGE LEARNING 6 LINCOLN KING DAY CARE 6 DISCOVERY LEARNING ACADEMY, INC. 6 JELLYBEAN LEARNING CENTER 6 KENYATTA'S DAYCARE 5 LAKE & PULASKI CHILD DEVELOPMENT CENTER 5 MOLADE' CHILD DEVELOPMENT CENTER 5 CENTRO INFANTIL 5 THE EDSEL ALBERT AMMONS NURSER 5 MONTESSORI ACDY. INFT/TOD. CNT 5 ANGELS 5 GREATER INSTITUTE AME CHURCH 5 GRANT DAY CARE INC 5 ADA S MCKINLEY MAGGIE DRUMMON 5 THE CRYSTAL PALACE EARLY LITERACY ZONE 5 EZZARD CHARLES DAYCARE CENTER 5 Name: DBA Name, dtype: int64
chicago_map = folium.Map([41.90293279, -87.70769386], zoom_start=11)
chicago_map
# Lets plot the top frequent violators on Chicago HeatMap
chicago_map.add_child(plugins.HeatMap(list_of_lAT_LONG_pairs, radius=15))
chicago_map
As you might have guessed by now, it must be really cheap to do so, those frequent violators reobtain business license multiple times under the same business name for only $165 application fee based on the official numbers published on the City of Chicago - Business Licensing
And it appears the city of Chicago is willinig to rubber-stamp the approval of the application for only $165, rather than imposing the very simple rule: ( 3 strikes and you are out )
The PDF document your are submitting must have the source code and the output for the following four requirements
Provide your comparative analysis for the results obtained from 3 experiments you executed above
The first experiment using regex resulted in 601 results. Experiment 2 and 3 resulted in significantly less results (141 for experiment 2 and 145 for experiment 3). For experiments 2 and 3 we only used "fuzziness" with a score of one. This means that the term we entered could only have one edit to match. Comparatively, experiment 1 used regex with "*" at the beginning and end of "Children" that matches any character or quantity, so this means it could potentially match characters with a much higher fuzziness score. For example, an object with "grandchildren" could have been a result in experiment 1, whereas experiment 2 and 3 would not have had it (elasticsearch queries are also not case sensitive). Interestingly after only briefly looking at all the results, I believe the column or characteristic that pulled "children" the most was violations.
</font>
Rerun Experiments #1, #2, #3 but searching for "Child" matches
#Experiment 1 with "child"
query = {
'size' : 10000,
'query': {
'bool': {
'must' : [{'match' : {'Results': 'Fail'}}, {"match" : {'Risk': {"query": 'Risk 1 (High)', "operator": "and"}} }, # same as where clasue in SQL
{"query_string": {
"query": "*Child*", #using regex of children to match all posssible combinations of "Children"
"fields": ["Facility Type","Violations","DBA Name"] #Multi-field matching query
}
}
]
}
}
}
results = es.search(index='food_inspections', body=query, scroll='1h')
sid = results['_scroll_id']
scroll_size = results['hits']['total']
print('sid = ', sid)
print('Scroll Size = ', scroll_size)
sid = DnF1ZXJ5VGhlbkZldGNoCgAAAAABTkp9FmxSTjR3TFBsUXotQ2ZRQ0toYzlGQ0EAAAAAAU5KfBZsUk40d0xQbFF6LUNmUUNLaGM5RkNBAAAAAAFOSnkWbFJONHdMUGxRei1DZlFDS2hjOUZDQQAAAAABTkp7FmxSTjR3TFBsUXotQ2ZRQ0toYzlGQ0EAAAAAAU5KehZsUk40d0xQbFF6LUNmUUNLaGM5RkNBAAAAAAFOSn4WbFJONHdMUGxRei1DZlFDS2hjOUZDQQAAAAABTkp_FmxSTjR3TFBsUXotQ2ZRQ0toYzlGQ0EAAAAAAU5KgRZsUk40d0xQbFF6LUNmUUNLaGM5RkNBAAAAAAFOSoAWbFJONHdMUGxRei1DZlFDS2hjOUZDQQAAAAABTkqCFmxSTjR3TFBsUXotQ2ZRQ0toYzlGQ0E= Scroll Size = 774
# results['hits']['hits']
count = 0
list_of_lAT_LONG_pairs = []
while(scroll_size > 0):
for inspection in results['hits']['hits']: #Iterating each results of the qurey
current_location_lAT_LONG = []
document = inspection['_source']
count = count +1
#defensive coding to ensure we have the fields in the inspection documents
if 'Latitude' in document.keys():
if 'Longitude' in document.keys():
if 'Address' in document.keys():
if(document['Latitude'] != None and document['Longitude'] != None and document['Address'] != None):
current_location_lAT_LONG.append(float(document['Latitude'])) #Appending Latitude and Longitude into the list
current_location_lAT_LONG.append(float(document['Longitude']))
list_of_lAT_LONG_pairs.append(current_location_lAT_LONG)
results = es.scroll(scroll_id = sid, scroll = '2m')
sid = results['_scroll_id'] #Changing the scroll-id
scroll_size = len(results['hits']['hits'])
print("the total number of match with children using wild card:",count)
the total number of match with children using wild card: 774
list_of_lAT_LONG_pairs[:3]
[[41.8814369069, -87.6659213595], [41.760441801, -87.6735652436], [41.9531127244, -87.7800185741]]
chicago_map = folium.Map([41.90293279, -87.70769386], zoom_start=11)
chicago_map.add_child(plugins.HeatMap(list_of_lAT_LONG_pairs, radius=15))
chicago_map
#experiment 2 with child
#zero results with "Childs"
query = {
'size' : 10000,
'query': {
'bool': {
'must' : [{'match' : {'Results': 'Fail'}}, {"match" : {'Risk': {"query": 'Risk 1 (High)', "operator": "and"}} }, # same as where clasue in SQL
{"query_string": {
"query": "Child",
"fuzziness": "1",
"fields": ["Facility Type","Violations","DBA Name"]
}
}
]
}
}
}
results = es.search(index='food_inspections', body=query,scroll='1h')
sid = results['_scroll_id']
scroll_size = results['hits']['total']
count = 0
list_of_lAT_LONG_pairs = []
while(scroll_size > 0):
for inspection in results['hits']['hits']:
current_location_lAT_LONG = []
document = inspection['_source']
count = count +1
#defensive coding to ensure we have the fields in the inspection documents
if 'Latitude' in document.keys():
if 'Longitude' in document.keys():
if 'Address' in document.keys():
if(document['Latitude'] != None and document['Longitude'] != None and document['Address'] != None):
current_location_lAT_LONG.append(float(document['Latitude']))
current_location_lAT_LONG.append(float(document['Longitude']))
list_of_lAT_LONG_pairs.append(current_location_lAT_LONG)
results = es.scroll(scroll_id = sid, scroll = '2m')
sid = results['_scroll_id']
scroll_size = len(results['hits']['hits'])
print("Total number of query matches with children using fuziness:",count)
Total number of query matches with children using fuziness: 158
#experiment 3 with child
query = {
'size' : 10000,
'query': {
'bool': {
'must' : [{'match' : {'Results': 'Fail'}}, {"match" : {'Risk': {"query": 'Risk 1 (High)', "operator": "and"}} }, # same as where clasue in SQL
{"query_string": {
"query": "Child's",
"fuzziness": "1",
"fields": ["Facility Type","Violations","DBA Name"]
}
}
]
}
}
}
results = es.search(index='food_inspections', body=query,scroll='1h')
sid = results['_scroll_id']
scroll_size = results['hits']['total']
count = 0
list_of_lAT_LONG_pairs = []
while(scroll_size > 0):
for inspection in results['hits']['hits']:
current_location_lAT_LONG = []
document = inspection['_source']
count = count +1
#defensive coding to ensure we have the fields in the inspection documents
if 'Latitude' in document.keys():
if 'Longitude' in document.keys():
if 'Address' in document.keys():
if(document['Latitude'] != None and document['Longitude'] != None and document['Address'] != None):
current_location_lAT_LONG.append(float(document['Latitude']))
current_location_lAT_LONG.append(float(document['Longitude']))
list_of_lAT_LONG_pairs.append(current_location_lAT_LONG)
results = es.scroll(scroll_id = sid, scroll = '2m')
sid = results['_scroll_id']
scroll_size = len(results['hits']['hits'])
print("Total number of match with Children's using fuziness:",count)
Total number of match with Children's using fuziness: 8
chicago_map = folium.Map([41.90293279, -87.70769386], zoom_start=11)
# Lets plot the query matches for "Children's" on Chicago HeatMap
chicago_map.add_child(plugins.HeatMap(list_of_lAT_LONG_pairs, radius=15))
chicago_map
In Experiment #4 we have obtained the list of frequent vilators, produce a table that shows DBA Name, number of violations and number of licenses issued for every DBA Name
#base answer off of this df
df_top_frequent_violators.head()
| Inspection ID | DBA Name | AKA Name | License # | Facility Type | Risk | Address | City | State | Zip | Inspection Date | Inspection Type | Results | Violations | Latitude | Longitude | Location | doc_count | score | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1319663 | BUSY BUMBLE BEE ACADEMY DAYCARE | BUSY BUMBLE BEE ACADEMY DAYCARE | 2215472.0 | Daycare (2 - 6 Years) | Risk 1 (High) | 6450 S COTTAGE GROVE AVE | CHICAGO | IL | 60637.0 | 07/17/2013 | License | Fail | 18. NO EVIDENCE OF RODENT OR INSECT OUTER OPEN... | 41.777092 | -87.606004 | (41.777092394888655, -87.60600377956905) | 9 | 8.258452 |
| 2 | 1229713 | BUSY BUMBLE BEE ACADEMY DAYCARE | BUSY BUMBLE BEE ACADEMY DAYCARE | 3793.0 | Daycare (2 - 6 Years) | Risk 1 (High) | 6450 S COTTAGE GROVE AVE | CHICAGO | IL | 60637.0 | 06/20/2012 | Canvass | Fail | 18. NO EVIDENCE OF RODENT OR INSECT OUTER OPEN... | 41.777092 | -87.606004 | (41.777092394888655, -87.60600377956905) | 9 | 8.199016 |
| 3 | 1515476 | BUSY BUMBLE BEE ACADEMY DAYCARE | BUSY BUMBLE BEE ACADEMY DAYCARE | 2215472.0 | Daycare (2 - 6 Years) | Risk 1 (High) | 6450 S COTTAGE GROVE AVE | CHICAGO | IL | 60637.0 | 12/29/2014 | Complaint | Fail | 18. NO EVIDENCE OF RODENT OR INSECT OUTER OPEN... | 41.777092 | -87.606004 | (41.777092394888655, -87.60600377956905) | 9 | 8.112285 |
| 4 | 1229852 | BUSY BUMBLE BEE ACADEMY DAYCARE | BUSY BUMBLE BEE ACADEMY DAYCARE | 1194190.0 | Daycare (2 - 6 Years) | Risk 1 (High) | 6450 S COTTAGE GROVE AVE | CHICAGO | IL | 60637.0 | 06/28/2012 | Canvass Re-Inspection | Fail | 14. PREVIOUS SERIOUS VIOLATION CORRECTED, 7-42... | 41.777092 | -87.606004 | (41.777092394888655, -87.60600377956905) | 9 | 8.112285 |
| 5 | 1386187 | BUSY BUMBLE BEE ACADEMY DAYCARE | BUSY BUMBLE BEE ACADEMY DAYCARE | 2215472.0 | Daycare (2 - 6 Years) | Risk 1 (High) | 6450 S COTTAGE GROVE AVE | CHICAGO | IL | 60637.0 | 06/08/2015 | License | Fail | 18. NO EVIDENCE OF RODENT OR INSECT OUTER OPEN... | 41.777092 | -87.606004 | (41.777092394888655, -87.60600377956905) | 9 | 8.095845 |
#first we need to rename some columns to apply the filtering syntax below this cell
rename_df = df_top_frequent_violators.copy()
rename_df.rename(columns = {'DBA Name':'DBAName'}, inplace=True)
rename_df.rename(columns = {'License #':'License'}, inplace=True)
rename_df.head()
| Inspection ID | DBAName | AKA Name | License | Facility Type | Risk | Address | City | State | Zip | Inspection Date | Inspection Type | Results | Violations | Latitude | Longitude | Location | doc_count | score | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 1 | 1319663 | BUSY BUMBLE BEE ACADEMY DAYCARE | BUSY BUMBLE BEE ACADEMY DAYCARE | 2215472.0 | Daycare (2 - 6 Years) | Risk 1 (High) | 6450 S COTTAGE GROVE AVE | CHICAGO | IL | 60637.0 | 07/17/2013 | License | Fail | 18. NO EVIDENCE OF RODENT OR INSECT OUTER OPEN... | 41.777092 | -87.606004 | (41.777092394888655, -87.60600377956905) | 9 | 8.258452 |
| 2 | 1229713 | BUSY BUMBLE BEE ACADEMY DAYCARE | BUSY BUMBLE BEE ACADEMY DAYCARE | 3793.0 | Daycare (2 - 6 Years) | Risk 1 (High) | 6450 S COTTAGE GROVE AVE | CHICAGO | IL | 60637.0 | 06/20/2012 | Canvass | Fail | 18. NO EVIDENCE OF RODENT OR INSECT OUTER OPEN... | 41.777092 | -87.606004 | (41.777092394888655, -87.60600377956905) | 9 | 8.199016 |
| 3 | 1515476 | BUSY BUMBLE BEE ACADEMY DAYCARE | BUSY BUMBLE BEE ACADEMY DAYCARE | 2215472.0 | Daycare (2 - 6 Years) | Risk 1 (High) | 6450 S COTTAGE GROVE AVE | CHICAGO | IL | 60637.0 | 12/29/2014 | Complaint | Fail | 18. NO EVIDENCE OF RODENT OR INSECT OUTER OPEN... | 41.777092 | -87.606004 | (41.777092394888655, -87.60600377956905) | 9 | 8.112285 |
| 4 | 1229852 | BUSY BUMBLE BEE ACADEMY DAYCARE | BUSY BUMBLE BEE ACADEMY DAYCARE | 1194190.0 | Daycare (2 - 6 Years) | Risk 1 (High) | 6450 S COTTAGE GROVE AVE | CHICAGO | IL | 60637.0 | 06/28/2012 | Canvass Re-Inspection | Fail | 14. PREVIOUS SERIOUS VIOLATION CORRECTED, 7-42... | 41.777092 | -87.606004 | (41.777092394888655, -87.60600377956905) | 9 | 8.112285 |
| 5 | 1386187 | BUSY BUMBLE BEE ACADEMY DAYCARE | BUSY BUMBLE BEE ACADEMY DAYCARE | 2215472.0 | Daycare (2 - 6 Years) | Risk 1 (High) | 6450 S COTTAGE GROVE AVE | CHICAGO | IL | 60637.0 | 06/08/2015 | License | Fail | 18. NO EVIDENCE OF RODENT OR INSECT OUTER OPEN... | 41.777092 | -87.606004 | (41.777092394888655, -87.60600377956905) | 9 | 8.095845 |
#get number of violation counts
value_counts = df_top_frequent_violators['DBA Name'].value_counts()
df_val_counts = pd.DataFrame(value_counts)
df_val_counts = df_val_counts.reset_index()
df_val_counts.columns = ['DBA Name', 'Violation count']
df_val_counts.head()
| DBA Name | Violation count | |
|---|---|---|
| 0 | BUSY BUMBLE BEE ACADEMY DAYCARE | 9 |
| 1 | BOTTLES TO BOOKS LEARNING CENTER | 8 |
| 2 | AMAZING GRACE DAYCARE CENTER | 7 |
| 3 | A CHILD'S WORLD EARLY LEARNING CENTER | 7 |
| 4 | COMMONWEALTH DAYCARE CENTER | 6 |
#get license count number
license_counts = rename_df.groupby('DBAName').License.nunique()
df_lic_counts = pd.DataFrame(license_counts)
df_lic_counts = df_lic_counts.reset_index()
df_lic_counts.columns = ['DBA Name', 'License # Count']
df_lic_counts.head()
| DBA Name | License # Count | |
|---|---|---|
| 0 | A CHILD'S WORLD EARLY LEARNING CENTER | 2 |
| 1 | ADA S MCKINLEY MAGGIE DRUMMON | 2 |
| 2 | AMAZING GRACE DAYCARE CENTER | 2 |
| 3 | ANGELS | 2 |
| 4 | BOTTLES TO BOOKS LEARNING CENTER | 2 |
#we can check the amount of license numbers with the below
dbs = df_top_frequent_violators.groupby(['DBA Name','License #'])
dbs.first()
| Inspection ID | AKA Name | Facility Type | Risk | Address | City | State | Zip | Inspection Date | Inspection Type | Results | Violations | Latitude | Longitude | Location | doc_count | score | ||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| DBA Name | License # | |||||||||||||||||
| A CHILD'S WORLD EARLY LEARNING CENTER | 1357825.0 | 1235192 | A CHILD'S WORLD EARLY LEARNING CENTER | Daycare (2 - 6 Years) | Risk 1 (High) | 2145 E 83RD ST | CHICAGO | IL | 60617.0 | 09/12/2012 | Complaint | Fail | 38. VENTILATION: ROOMS AND EQUIPMENT VENTED AS... | 41.744280 | -87.571635 | (41.74427989606148, -87.5716351762223) | 7 | 8.199016 |
| 1768092.0 | 68443 | A CHILD'S WORLD EARLY LEARNING CENTER | Daycare (2 - 6 Years) | Risk 1 (High) | 2145 E 83RD ST | CHICAGO | IL | 60617.0 | 06/03/2010 | License | Fail | 18. NO EVIDENCE OF RODENT OR INSECT OUTER OPEN... | 41.744280 | -87.571635 | (41.74427989606148, -87.5716351762223) | 7 | 8.082044 | |
| ADA S MCKINLEY MAGGIE DRUMMON | 21679.0 | 1072222 | ADA S MCKINLEY MAGGIE DRUMMON | Daycare (2 - 6 Years) | Risk 1 (High) | 4301 S WABASH AVE | CHICAGO | IL | 60653.0 | 10/29/2012 | License Re-Inspection | Fail | 38. VENTILATION: ROOMS AND EQUIPMENT VENTED AS... | 41.816425 | -87.624506 | (41.8164253885832, -87.62450556122445) | 5 | 8.199016 |
| 2215740.0 | 2102863 | ADA S MCKINLEY MAGGIE DRUMMON | Daycare (2 - 6 Years) | Risk 1 (High) | 4301 S WABASH AVE | CHICAGO | IL | 60653.0 | 10/31/2017 | Canvass | Fail | 18. NO EVIDENCE OF RODENT OR INSECT OUTER OPEN... | 41.816425 | -87.624506 | (41.8164253885832, -87.62450556122445) | 5 | 8.024600 | |
| AMAZING GRACE DAYCARE CENTER | 2129979.0 | 545695 | AMAZING GRACE DAYCARE CENTER | Daycare Above and Under 2 Years | Risk 1 (High) | 11123-11125 S HALSTED ST | CHICAGO | IL | 60628.0 | 12/20/2011 | License Re-Inspection | Fail | 24. DISH WASHING FACILITIES: PROPERLY DESIGNED... | 41.691646 | -87.642214 | (41.69164609748754, -87.6422140544927) | 7 | 8.724060 |
| 2215901.0 | 1591253 | AMAZING GRACE DAYCARE CENTER | Daycare Above and Under 2 Years | Risk 1 (High) | 11123-11125 S HALSTED ST | CHICAGO | IL | 60628.0 | 12/04/2015 | License | Fail | 18. NO EVIDENCE OF RODENT OR INSECT OUTER OPEN... | 41.691646 | -87.642214 | (41.69164609748754, -87.6422140544927) | 7 | 8.724060 | |
| ANGELS | 2042561.0 | 1234770 | ANGELS | Daycare (2 - 6 Years) | Risk 1 (High) | 3941-3943 S INDIANA AVE | CHICAGO | IL | 60653.0 | 08/21/2012 | License | Fail | 18. NO EVIDENCE OF RODENT OR INSECT OUTER OPEN... | 41.822370 | -87.621420 | (41.82237019007892, -87.62142025266414) | 5 | 8.215062 |
| 2216065.0 | 1746314 | ANGELS | Daycare (2 - 6 Years) | Risk 1 (High) | 3941-3943 S INDIANA AVE | CHICAGO | IL | 60653.0 | 10/11/2016 | License | Fail | 26. ADEQUATE NUMBER, CONVENIENT, ACCESSIBLE, P... | 41.822370 | -87.621420 | (41.82237019007892, -87.62142025266414) | 5 | 8.082044 | |
| BOTTLES TO BOOKS LEARNING CENTER | 1926534.0 | 1285277 | BOTTLES TO BOOKS LEARNING CENTER | Daycare Above and Under 2 Years | Risk 1 (High) | 6014 S RACINE AVE | CHICAGO | IL | 60636.0 | 09/17/2012 | Canvass | Fail | 18. NO EVIDENCE OF RODENT OR INSECT OUTER OPEN... | 41.784576 | -87.654775 | (41.78457591499572, -87.6547753761994) | 8 | 8.724060 |
| 1943545.0 | 233392 | BOTTLES TO BOOKS LEARNING CENTER | Daycare Above and Under 2 Years | Risk 1 (High) | 6014 S RACINE AVE | CHICAGO | IL | 60636.0 | 04/29/2010 | Canvass Re-Inspection | Fail | 14. PREVIOUS SERIOUS VIOLATION CORRECTED, 7-42... | 41.784576 | -87.654775 | (41.78457591499572, -87.6547753761994) | 8 | 8.724060 | |
| BUSY BUMBLE BEE ACADEMY DAYCARE | 3793.0 | 1229713 | BUSY BUMBLE BEE ACADEMY DAYCARE | Daycare (2 - 6 Years) | Risk 1 (High) | 6450 S COTTAGE GROVE AVE | CHICAGO | IL | 60637.0 | 06/20/2012 | Canvass | Fail | 18. NO EVIDENCE OF RODENT OR INSECT OUTER OPEN... | 41.777092 | -87.606004 | (41.777092394888655, -87.60600377956905) | 9 | 8.199016 |
| 1194190.0 | 1229852 | BUSY BUMBLE BEE ACADEMY DAYCARE | Daycare (2 - 6 Years) | Risk 1 (High) | 6450 S COTTAGE GROVE AVE | CHICAGO | IL | 60637.0 | 06/28/2012 | Canvass Re-Inspection | Fail | 14. PREVIOUS SERIOUS VIOLATION CORRECTED, 7-42... | 41.777092 | -87.606004 | (41.777092394888655, -87.60600377956905) | 9 | 8.112285 | |
| 2215472.0 | 1319663 | BUSY BUMBLE BEE ACADEMY DAYCARE | Daycare (2 - 6 Years) | Risk 1 (High) | 6450 S COTTAGE GROVE AVE | CHICAGO | IL | 60637.0 | 07/17/2013 | License | Fail | 18. NO EVIDENCE OF RODENT OR INSECT OUTER OPEN... | 41.777092 | -87.606004 | (41.777092394888655, -87.60600377956905) | 9 | 8.258452 | |
| CENTRO INFANTIL | 1352154.0 | 1206215 | CENTRO INFANTIL | Daycare Above and Under 2 Years | Risk 1 (High) | 2739 W DIVISION ST | CHICAGO | IL | 60622.0 | 05/07/2012 | Canvass | Fail | 18. NO EVIDENCE OF RODENT OR INSECT OUTER OPEN... | 41.902822 | -87.695990 | (41.9028223612357, -87.69599039701163) | 5 | 8.724060 |
| 2031968.0 | 250268 | CENTRO INFANTIL | Daycare Above and Under 2 Years | Risk 1 (High) | 2739 W DIVISION ST | CHICAGO | IL | 60622.0 | 05/25/2010 | License | Fail | 9. WATER SOURCE: SAFE, HOT & COLD UNDER CITY P... | 41.902822 | -87.695990 | (41.9028223612357, -87.69599039701163) | 5 | 8.519373 | |
| 2215516.0 | 1948254 | CENTRO INFANTIL | Daycare Above and Under 2 Years | Risk 1 (High) | 2739 W DIVISION ST | CHICAGO | IL | 60622.0 | 07/27/2016 | Canvass | Fail | 19. OUTSIDE GARBAGE WASTE GREASE AND STORAGE A... | 41.902822 | -87.695990 | (41.9028223612357, -87.69599039701163) | 5 | 8.640399 | |
| COMMONWEALTH DAYCARE CENTER | 25805.0 | 1229478 | COMMONWEALTH DAYCARE CENTER | Daycare (2 - 6 Years) | Risk 1 (High) | 140 W 81ST ST | CHICAGO | IL | 60620.0 | 06/07/2012 | License | Fail | 18. NO EVIDENCE OF RODENT OR INSECT OUTER OPEN... | 41.747342 | -87.628526 | (41.74734238484423, -87.62852601416522) | 6 | 8.258452 |
| DISCOVERY LEARNING ACADEMY, INC. | 2103691.0 | 586094 | DISCOVERY LEARNING ACADEMY, INC. | Daycare Above and Under 2 Years | Risk 1 (High) | 1318 W 95TH ST | CHICAGO | IL | 60643.0 | 08/02/2011 | License Re-Inspection | Fail | 34. FLOORS: CONSTRUCTED PER CODE, CLEANED, GOO... | 41.721409 | -87.656200 | (41.721409457750035, -87.65620011485993) | 6 | 8.640399 |
| 2216144.0 | 1562185 | DISCOVERY LEARNING ACADEMY, INC. | Daycare Above and Under 2 Years | Risk 1 (High) | 1318 W 95TH ST | CHICAGO | IL | 60643.0 | 08/13/2015 | License | Fail | 24. DISH WASHING FACILITIES: PROPERLY DESIGNED... | 41.721409 | -87.656200 | (41.721409457750035, -87.65620011485993) | 6 | 8.582428 | |
| EARLY CHILDHOOD EDUCARE CENTER | 15758.0 | 673275 | EARLY CHILDHOOD EDUCARE CENTER | Daycare Above and Under 2 Years | Risk 1 (High) | 5044 S WABASH AVE | CHICAGO | IL | 60615.0 | 01/23/2012 | License | Fail | 13. NO EVIDENCE OF RODENT OR INSECT INFESTATIO... | 41.802458 | -87.624433 | (41.8024577384093, -87.62443259498743) | 6 | 8.550328 |
| 15759.0 | 673284 | EARLY CHILDHOOD EDUCARE CENTER | Daycare Above and Under 2 Years | Risk 1 (High) | 5044 S WABASH AVE | CHICAGO | IL | 60615.0 | 01/26/2012 | License Re-Inspection | Fail | 13. NO EVIDENCE OF RODENT OR INSECT INFESTATIO... | 41.802458 | -87.624433 | (41.8024577384093, -87.62443259498743) | 6 | 8.694059 | |
| 2215670.0 | 1631680 | EARLY CHILDHOOD EDUCARE CENTER | Daycare Above and Under 2 Years | Risk 1 (High) | 5044 S WABASH AVE | CHICAGO | IL | 60615.0 | 02/16/2016 | License Re-Inspection | Fail | 18. NO EVIDENCE OF RODENT OR INSECT OUTER OPEN... | 41.802458 | -87.624433 | (41.8024577384093, -87.62443259498743) | 6 | 8.694958 | |
| EZZARD CHARLES DAYCARE CENTER | 30849.0 | 1072458 | EZZARD CHARLES DAYCARE CENTER | Daycare (2 - 6 Years) | Risk 1 (High) | 7946 S ASHLAND AVE | CHICAGO | IL | 60620.0 | 06/13/2013 | Complaint | Fail | 18. NO EVIDENCE OF RODENT OR INSECT OUTER OPEN... | 41.748958 | -87.663555 | (41.74895804211779, -87.66355545886083) | 5 | 8.095845 |
| 2215585.0 | 2135661 | EZZARD CHARLES DAYCARE CENTER | Daycare (2 - 6 Years) | Risk 1 (High) | 7946 S ASHLAND AVE | CHICAGO | IL | 60620.0 | 01/19/2018 | Complaint | Fail | 18. NO EVIDENCE OF RODENT OR INSECT OUTER OPEN... | 41.748958 | -87.663555 | (41.74895804211779, -87.66355545886083) | 5 | 8.258452 | |
| FIRMAN COMMUNITY SERVICES | 37404.0 | 1154287 | FIRMAN COMMUNITY SERVICES | Daycare (2 - 6 Years) | Risk 1 (High) | 4910 S DR MARTIN LUTHER KING JR DR | CHICAGO | IL | 60615.0 | 05/03/2012 | License Re-Inspection | Fail | 14. PREVIOUS SERIOUS VIOLATION CORRECTED, 7-42... | 41.805367 | -87.616633 | (41.80536679084146, -87.61663349388954) | 6 | 8.199016 |
| 37405.0 | 580895 | FIRMAN COMMUNITY SERVICES | Daycare (2 - 6 Years) | Risk 1 (High) | 37 W 47TH ST | CHICAGO | IL | 60609.0 | 05/11/2011 | Canvass | Fail | 26. ADEQUATE NUMBER, CONVENIENT, ACCESSIBLE, P... | 41.809097 | -87.627599 | (41.80909744182056, -87.62759936903998) | 6 | 8.258452 | |
| 1645677.0 | 158402 | FIRMAN COMMUNITY SERVICES | Daycare (2 - 6 Years) | Risk 1 (High) | 5401 S WENTWORTH AVE | CHICAGO | IL | 60609.0 | 04/21/2010 | License | Fail | 33. FOOD AND NON-FOOD CONTACT EQUIPMENT UTENSI... | 41.796235 | -87.630405 | (41.796234875219376, -87.63040492438661) | 6 | 8.024600 | |
| 2215747.0 | 1937388 | FIRMAN COMMUNITY SERVICES | Daycare (2 - 6 Years) | Risk 1 (High) | 37 W 47TH ST | CHICAGO | IL | 60609.0 | 06/23/2016 | License | Fail | 18. NO EVIDENCE OF RODENT OR INSECT OUTER OPEN... | 41.809097 | -87.627599 | (41.80909744182056, -87.62759936903998) | 6 | 8.095845 | |
| GRANT DAY CARE INC | 15719.0 | 417256 | GRANT DAY CARE INC | Daycare (2 - 6 Years) | Risk 1 (High) | 4025 S DREXEL BLVD | CHICAGO | IL | 60653.0 | 11/01/2010 | Canvass Re-Inspection | Fail | 9. WATER SOURCE: SAFE, HOT & COLD UNDER CITY P... | 41.820840 | -87.604769 | (41.82083961560149, -87.60476869191355) | 5 | 8.258452 |
| GREATER INSTITUTE AME CHURCH | 15816.0 | 1235012 | GREATER INSTITUTE AME CHURCH | Daycare (2 - 6 Years) | Risk 1 (High) | 7800 S INDIANA AVE | CHICAGO | IL | 60619.0 | 09/04/2012 | License | Fail | 13. NO EVIDENCE OF RODENT OR INSECT INFESTATIO... | 41.752764 | -87.619955 | (41.75276438249955, -87.61995523185541) | 5 | 8.215062 |
| 2215513.0 | 1386190 | GREATER INSTITUTE AME CHURCH | Daycare (2 - 6 Years) | Risk 1 (High) | 7800 S INDIANA AVE | CHICAGO | IL | 60619.0 | 06/09/2015 | Canvass | Fail | 18. NO EVIDENCE OF RODENT OR INSECT OUTER OPEN... | 41.752764 | -87.619955 | (41.75276438249955, -87.61995523185541) | 5 | 8.199016 | |
| JELLYBEAN LEARNING CENTER | 1546645.0 | 1215098 | JELLYBEAN LEARNING CENTER | Daycare (2 - 6 Years) | Risk 1 (High) | 8501 S ASHLAND AVE | CHICAGO | IL | 60620.0 | 05/23/2012 | License | Fail | 21. * CERTIFIED FOOD MANAGER ON SITE WHEN POTE... | 41.739343 | -87.663008 | (41.739343208088016, -87.66300791918849) | 6 | 8.112285 |
| 1914173.0 | 545263 | JELLYBEAN LEARNING CENTER | Daycare Above and Under 2 Years | Risk 1 (High) | 358-370 E 71ST ST | CHICAGO | IL | 60619.0 | 04/04/2011 | Canvass | Fail | 18. NO EVIDENCE OF RODENT OR INSECT OUTER OPEN... | 41.765802 | -87.616183 | (41.76580164804015, -87.61618278066202) | 6 | 8.640399 | |
| 1914174.0 | 285098 | JELLYBEAN LEARNING CENTER | Daycare Above and Under 2 Years | Risk 1 (High) | 358-370 E 71ST ST | CHICAGO | IL | 60619.0 | 09/07/2010 | License | Fail | 18. NO EVIDENCE OF RODENT OR INSECT OUTER OPEN... | 41.765802 | -87.616183 | (41.76580164804015, -87.61618278066202) | 6 | 8.694958 | |
| 2215994.0 | 1496748 | JELLYBEAN LEARNING CENTER | Daycare Above and Under 2 Years | Risk 1 (High) | 358-370 E 71ST ST | CHICAGO | IL | 60619.0 | 09/16/2014 | License | Fail | 18. NO EVIDENCE OF RODENT OR INSECT OUTER OPEN... | 41.765802 | -87.616183 | (41.76580164804015, -87.61618278066202) | 6 | 8.694059 | |
| KENYATTA'S DAYCARE | 30919.0 | 335302 | KENYATTA'S DAYCARE | Daycare Above and Under 2 Years | Risk 1 (High) | 2334 E 75TH ST | CHICAGO | IL | 60649.0 | 09/17/2010 | Canvass | Fail | 41. PREMISES MAINTAINED FREE OF LITTER, UNNECE... | 41.759085 | -87.567448 | (41.75908547586528, -87.56744784945764) | 5 | 8.582428 |
| 1227101.0 | 335301 | KENYATTA'S DAYCARE | Daycare Above and Under 2 Years | Risk 1 (High) | 2334 E 75TH ST | CHICAGO | IL | 60649.0 | 09/17/2010 | Canvass | Fail | 18. NO EVIDENCE OF RODENT OR INSECT OUTER OPEN... | 41.759085 | -87.567448 | (41.75908547586528, -87.56744784945764) | 5 | 8.694059 | |
| 2215571.0 | 1513078 | KENYATTA'S DAYCARE | Daycare Above and Under 2 Years | Risk 1 (High) | 2334 E 75TH ST | CHICAGO | IL | 60649.0 | 12/16/2014 | License | Fail | 9. WATER SOURCE: SAFE, HOT & COLD UNDER CITY P... | 41.759085 | -87.567448 | (41.75908547586528, -87.56744784945764) | 5 | 8.519373 | |
| KIDS R FIRST LEARNING ACADEMY | 1899719.0 | 250522 | KIDS R FIRST LEARNING ACADEMY | Daycare Above and Under 2 Years | Risk 1 (High) | 1155 W 81ST ST | CHICAGO | IL | 60620.0 | 06/03/2010 | License Re-Inspection | Fail | 14. PREVIOUS SERIOUS VIOLATION CORRECTED, 7-42... | 41.746751 | -87.653409 | (41.74675127864962, -87.65340876316907) | 6 | 8.694958 |
| 2215985.0 | 1961104 | KIDS R FIRST LEARNING ACADEMY | Daycare Above and Under 2 Years | Risk 1 (High) | 1155 W 81ST ST | CHICAGO | IL | 60620.0 | 09/23/2016 | License | Fail | 18. NO EVIDENCE OF RODENT OR INSECT OUTER OPEN... | 41.746751 | -87.653409 | (41.74675127864962, -87.65340876316907) | 6 | 8.694059 | |
| LAKE & PULASKI CHILD DEVELOPMENT CENTER | 1928369.0 | 419760 | LAKE & PULASKI CHILD DEVELOPMENT CENTER | Daycare Above and Under 2 Years | Risk 1 (High) | 316 N PULASKI RD | CHICAGO | IL | 60624.0 | 10/22/2010 | Consultation | Fail | 7. WASH AND RINSE WATER: CLEAN AND PROPER TEMP... | 41.885920 | -87.725964 | (41.885919645895974, -87.72596362714377) | 5 | 8.724060 |
| 2216861.0 | 1970408 | LAKE & PULASKI CHILD DEVELOPMENT CENTER | Daycare Above and Under 2 Years | Risk 1 (High) | 316 N PULASKI RD | CHICAGO | IL | 60624.0 | 10/28/2016 | Canvass | Fail | 18. NO EVIDENCE OF RODENT OR INSECT OUTER OPEN... | 41.885920 | -87.725964 | (41.885919645895974, -87.72596362714377) | 5 | 8.582428 | |
| LINCOLN KING DAY CARE | 51048.0 | 417280 | LINCOLN KING DAY CARE | Daycare (2 - 6 Years) | Risk 1 (High) | 4314 S COTTAGE GROVE AVE | CHICAGO | IL | 60653.0 | 11/18/2010 | License | Fail | None | 41.816326 | -87.606847 | (41.816325970337445, -87.6068467379213) | 6 | 8.215062 |
| LITTLE KIDS VILLAGE LEARNING | 1422942.0 | 577505 | LITTLE KIDS VILLAGE LEARNING | Daycare (2 - 6 Years) | Risk 1 (High) | 2656 W 71ST ST | CHICAGO | IL | 60629.0 | 09/01/2011 | License | Fail | 29. PREVIOUS MINOR VIOLATION(S) CORRECTED 7-42... | 41.764689 | -87.690441 | (41.764689317705226, -87.6904408401805) | 6 | 8.199016 |
| 1422943.0 | 577506 | LITTLE KIDS VILLAGE LEARNING | Daycare (2 - 6 Years) | Risk 1 (High) | 2656 W 71ST ST | CHICAGO | IL | 60629.0 | 09/01/2011 | License | Fail | 29. PREVIOUS MINOR VIOLATION(S) CORRECTED 7-42... | 41.764689 | -87.690441 | (41.764689317705226, -87.6904408401805) | 6 | 8.258452 | |
| Little People's Day Care & Kindergarten, Inc. | 1043532.0 | 519372 | Little People's Day Care & Kindergarten | Daycare (2 - 6 Years) | Risk 1 (High) | 7428 N ROGERS AVE | CHICAGO | IL | 60626.0 | 01/31/2012 | Canvass | Fail | 11. ADEQUATE NUMBER, CONVENIENT, ACCESSIBLE, D... | 42.016689 | -87.674150 | (42.01668938211222, -87.67415033629639) | 6 | 8.258452 |
| 2215588.0 | 1953552 | Little People's Day Care & Kindergarten | Daycare (2 - 6 Years) | Risk 1 (High) | 7428 N ROGERS AVE | CHICAGO | IL | 60626.0 | 08/31/2016 | Canvass | Fail | 30. FOOD IN ORIGINAL CONTAINER, PROPERLY LABEL... | 42.016689 | -87.674150 | (42.01668938211222, -87.67415033629639) | 6 | 8.258452 | |
| MOLADE' CHILD DEVELOPMENT CENTER | 1922899.0 | 284600 | MOLADE' CHILD DEVELOPMENT CENTER | Daycare (2 - 6 Years) | Risk 1 (High) | 1120 N LAMON AVE | CHICAGO | IL | 60651.0 | 08/20/2010 | Consultation | Fail | 8. SANITIZING RINSE FOR EQUIPMENT AND UTENSILS... | 41.901087 | -87.748540 | (41.901086765978654, -87.74854019856667) | 5 | 8.112285 |
| MONTESSORI ACDY. INFT/TOD. CNT | 47676.0 | 537498 | MONTESSORI ACDY. INFT/TOD. CNT | Daycare Above and Under 2 Years | Risk 1 (High) | 10232-10234 S HALSTED ST | CHICAGO | IL | 60628.0 | 07/25/2011 | License | Fail | None | 41.707740 | -87.643003 | (41.70774046981763, -87.64300283870763) | 5 | 8.694958 |
| 47677.0 | 537499 | MONTESSORI ACDY. INFT/TOD. CNT | Daycare Above and Under 2 Years | Risk 1 (High) | 10232-10234 S HALSTED ST | CHICAGO | IL | 60628.0 | 07/25/2011 | License | Fail | None | 41.707740 | -87.643003 | (41.70774046981763, -87.64300283870763) | 5 | 8.694059 | |
| 2219985.0 | 1360674 | MONTESSORI ACDY. INFT/TOD. CNT | Daycare Above and Under 2 Years | Risk 1 (High) | 10232-10234 S HALSTED ST | CHICAGO | IL | 60628.0 | 09/04/2013 | License | Fail | 21. * CERTIFIED FOOD MANAGER ON SITE WHEN POTE... | 41.707740 | -87.643003 | (41.70774046981763, -87.64300283870763) | 5 | 8.542657 | |
| THE CRYSTAL PALACE EARLY LITERACY ZONE | 1910208.0 | 543442 | THE CRYSTAL PALACE EARLY LITERACY ZONE | Daycare Above and Under 2 Years | Risk 1 (High) | 1459 W 74TH ST | CHICAGO | IL | 60636.0 | 06/16/2011 | Canvass | Fail | 18. NO EVIDENCE OF RODENT OR INSECT OUTER OPEN... | 41.759377 | -87.661115 | (41.75937734623751, -87.66111539963164) | 5 | 8.694059 |
| 2215971.0 | 1307510 | THE CRYSTAL PALACE EARLY LITERACY ZONE | Daycare Above and Under 2 Years | Risk 1 (High) | 1459 W 74TH ST | CHICAGO | IL | 60636.0 | 06/25/2014 | License | Fail | 18. NO EVIDENCE OF RODENT OR INSECT OUTER OPEN... | 41.759377 | -87.661115 | (41.75937734623751, -87.66111539963164) | 5 | 8.567911 | |
| THE EDSEL ALBERT AMMONS NURSER | 15803.0 | 285066 | THE EDSEL ALBERT AMMONS NURSER | Daycare (2 - 6 Years) | Risk 1 (High) | 549 E 76TH ST | CHICAGO | IL | 60619.0 | 09/03/2010 | License | Fail | 18. NO EVIDENCE OF RODENT OR INSECT OUTER OPEN... | 41.756551 | -87.610690 | (41.75655095611123, -87.61068980246957) | 5 | 8.215062 |
| THE WORLD IS YOUR'S CHILD CARE & LEARNING CENTER INC. | 1649970.0 | 284788 | THE WORLD IS YOUR'S CHILD CARE & LEARNING CENT... | Daycare (2 - 6 Years) | Risk 1 (High) | 8026 S COTTAGE GROVE AVE | CHICAGO | IL | 60619.0 | 08/26/2010 | License | Fail | None | 41.748625 | -87.605311 | (41.7486247573131, -87.6053113596727) | 6 | 8.215062 |
| 2215902.0 | 1387727 | THE WORLD IS YOUR'S CHILD CARE & LEARNING CENT... | Daycare (2 - 6 Years) | Risk 1 (High) | 8026 S COTTAGE GROVE AVE | CHICAGO | IL | 60619.0 | 10/31/2014 | License Re-Inspection | Fail | 14. PREVIOUS SERIOUS VIOLATION CORRECTED, 7-42... | 41.748625 | -87.605311 | (41.7486247573131, -87.6053113596727) | 6 | 8.258452 |
# now merge to df on DBA Name
print(df_lic_counts.shape)
print(df_val_counts.shape)
#same shape
req4 = df_lic_counts.merge(df_val_counts, on='DBA Name')
req4
(26, 2) (26, 2)
| DBA Name | License # Count | Violation count | |
|---|---|---|---|
| 0 | A CHILD'S WORLD EARLY LEARNING CENTER | 2 | 7 |
| 1 | ADA S MCKINLEY MAGGIE DRUMMON | 2 | 5 |
| 2 | AMAZING GRACE DAYCARE CENTER | 2 | 7 |
| 3 | ANGELS | 2 | 5 |
| 4 | BOTTLES TO BOOKS LEARNING CENTER | 2 | 8 |
| 5 | BUSY BUMBLE BEE ACADEMY DAYCARE | 3 | 9 |
| 6 | CENTRO INFANTIL | 3 | 5 |
| 7 | COMMONWEALTH DAYCARE CENTER | 1 | 6 |
| 8 | DISCOVERY LEARNING ACADEMY, INC. | 2 | 6 |
| 9 | EARLY CHILDHOOD EDUCARE CENTER | 3 | 6 |
| 10 | EZZARD CHARLES DAYCARE CENTER | 2 | 5 |
| 11 | FIRMAN COMMUNITY SERVICES | 4 | 6 |
| 12 | GRANT DAY CARE INC | 1 | 5 |
| 13 | GREATER INSTITUTE AME CHURCH | 2 | 5 |
| 14 | JELLYBEAN LEARNING CENTER | 4 | 6 |
| 15 | KENYATTA'S DAYCARE | 3 | 5 |
| 16 | KIDS R FIRST LEARNING ACADEMY | 2 | 6 |
| 17 | LAKE & PULASKI CHILD DEVELOPMENT CENTER | 2 | 5 |
| 18 | LINCOLN KING DAY CARE | 1 | 6 |
| 19 | LITTLE KIDS VILLAGE LEARNING | 2 | 6 |
| 20 | Little People's Day Care & Kindergarten, Inc. | 2 | 6 |
| 21 | MOLADE' CHILD DEVELOPMENT CENTER | 1 | 5 |
| 22 | MONTESSORI ACDY. INFT/TOD. CNT | 3 | 5 |
| 23 | THE CRYSTAL PALACE EARLY LITERACY ZONE | 2 | 5 |
| 24 | THE EDSEL ALBERT AMMONS NURSER | 1 | 5 |
| 25 | THE WORLD IS YOUR'S CHILD CARE & LEARNING CENT... | 2 | 6 |
Use the results of Experiment #4 to plot on the Heatmap those frequent violators who have obtained 3 business licenses or more under the same DBA Name through out the lifetime of their business
I solved this by first realizing that there were not many violators who obtained 3 business liceneses or more. So the easiest way to solve this for me, was to search an exact match on the few frequent violators (although, this is the most manual way and not best for production). I believe this also could have been solved by adjusting the query for > 3.
#let's see how many that is
req4[req4['License # Count']>= 3]
#we can see here that there is only seven
#adjusting our queries to find the lat and long of these businesses might not be too bad
| DBA Name | License # Count | Violation count | |
|---|---|---|---|
| 5 | BUSY BUMBLE BEE ACADEMY DAYCARE | 3 | 9 |
| 6 | CENTRO INFANTIL | 3 | 5 |
| 9 | EARLY CHILDHOOD EDUCARE CENTER | 3 | 6 |
| 11 | FIRMAN COMMUNITY SERVICES | 4 | 6 |
| 14 | JELLYBEAN LEARNING CENTER | 4 | 6 |
| 15 | KENYATTA'S DAYCARE | 3 | 5 |
| 22 | MONTESSORI ACDY. INFT/TOD. CNT | 3 | 5 |
query ={
'size' : 10000,
'query': {
"bool" : {
"should":[ {'match' : {'DBA Name': {"query" : 'BUSY BUMBLE BEE ACADEMY DAYCARE',"operator":"and"}}},
{'match' : {'DBA Name':{"query" : 'CENTRO INFANTIL',"operator": "and"}}},
{'match' : {'DBA Name':{"query" : 'EARLY CHILDHOOD EDUCARE CENTER',"operator" : "and"}}},
{'match' : {'DBA Name':{"query" : 'FIRMAN COMMUNITY SERVICES',"operator" : "and"}}},
{'match' : {'DBA Name':{"query" : 'JELLYBEAN LEARNING CENTER',"operator" : "and"}}},
{'match' : {'DBA Name':{"query" : "KENYATTA'S DAYCARE","operator" : "and"}}},
{'match' : {'DBA Name':{"query" : 'MONTESSORI ACDY. INFT/TOD. CNT',"operator" : "and"}}},
],
"minimum_should_match" : 1,
"filter" : [{"match" : {'Results': {"query": 'Fail', "operator": "and"}}},
{"match" : {'Risk': {"query": 'Risk 1 (High)', "operator": "and"}}}
]
}
},
"aggs" : {
"selected_dbas" :{
"terms" : {
"field" : "DBA Name.keyword",
"min_doc_count": 5,
"size" :10000
},
"aggs": {
"top_dba_hits": {
"top_hits": {
"size": 10
}
}
}
}
}
}
results = es.search(index='food_inspections', body=query,scroll='1h')
# results
list_of_lAT_LONG_pairs = []
for dba_bucket in results["aggregations"]["selected_dbas"]["buckets"]:
if "top_dba_hits" in dba_bucket and "hits" in dba_bucket["top_dba_hits"] and "hits" in dba_bucket["top_dba_hits"]["hits"]:
for hit in dba_bucket["top_dba_hits"]["hits"]["hits"]:
if "_source" in hit:
if "Latitude" in hit["_source"] and "Longitude" in hit["_source"]:
list_of_lAT_LONG_pairs.append([hit["_source"]["Latitude"], hit["_source"]["Longitude"]])
# Lets dumps the LAt and LONG
# list_of_lAT_LONG_pairs
chicago_map = folium.Map([41.90293279, -87.70769386], zoom_start=11)
chicago_map.add_child(plugins.HeatMap(list_of_lAT_LONG_pairs, radius=15))
chicago_map
Plot on the Heatmap those facilites that serve children but failed inspections with high risk, and MICE DROPPINGS were OBSERVED in the Violations; you have to execlude violations that stated NO MICE DROPPINGS were OBSERVED
This could be solved either by using regex, NoSQL include/exclude filters, or phrase matching. I solved this by using phrase matching, but I left a query attempt at the end that used include/exclude filters. For some reason using include/exclude did not work.
#phrase matching query
#another option could be doing a children fuzziness hit instead of matching facility type
query ={
'size' : 10000,
'query': {
"bool" : {
"should":[ {'match' : {'Facility Type': {"query" : 'Daycare (2 - 6 Years)',"operator":"and"}}},
{'match' : {'Facility Type':{"query" : 'Daycare Above and Under 2 Years',"operator": "and"}}},
{'match' : {'Facility Type':{"query" : 'CHILDRENS SERVICES FACILITY',"operator" : "and"}}},
],
"minimum_should_match" : 1,
"filter" : [
{"match" : {'Results': {"query": 'Fail', "operator": "and"}}},
{"match" : {'Risk': {"query": 'Risk 1 (High)', "operator": "and"}}},
{"match_phrase" : {'Violations': {"query": "NO MICE DROPPINGS"}}}
# {"match_phrase" : {'Violations': {"query": "<QUERY STRING 2>"}}}
],
# "must_not":{"match_phrase": {'Violations': 'NO MICE DROPPINGS'}}
}
}
}
results = es.search(index='food_inspections', body=query,scroll='1h')
results
{'_scroll_id': 'DnF1ZXJ5VGhlbkZldGNoCgAAAAABTkqhFmxSTjR3TFBsUXotQ2ZRQ0toYzlGQ0EAAAAAAU5KohZsUk40d0xQbFF6LUNmUUNLaGM5RkNBAAAAAAFOSqMWbFJONHdMUGxRei1DZlFDS2hjOUZDQQAAAAABTkqkFmxSTjR3TFBsUXotQ2ZRQ0toYzlGQ0EAAAAAAU5KpRZsUk40d0xQbFF6LUNmUUNLaGM5RkNBAAAAAAFOSqcWbFJONHdMUGxRei1DZlFDS2hjOUZDQQAAAAABTkqmFmxSTjR3TFBsUXotQ2ZRQ0toYzlGQ0EAAAAAAU5KqRZsUk40d0xQbFF6LUNmUUNLaGM5RkNBAAAAAAFOSqgWbFJONHdMUGxRei1DZlFDS2hjOUZDQQAAAAABTkqqFmxSTjR3TFBsUXotQ2ZRQ0toYzlGQ0E=',
'took': 4,
'timed_out': False,
'_shards': {'total': 10, 'successful': 10, 'skipped': 0, 'failed': 0},
'hits': {'total': 0, 'max_score': None, 'hits': []}}
I noticed with some experimenting that for the database given there actually were no records that had "NO MICE DROPPINGS". The above query proves that. The query structure above and below could be used to specifically match phrases and exclude phrases for your results. Although, I commented those lines out because now that we know there are no records that have "NO MICE DROPPINGS" we can search for "MICE DROPPINGS" and not have to worry about filtering.
#phrase matching query
#should redo with children fuzziness hit instead of matching facility type
query ={
'size' : 10000,
'query': {
"bool" : {
"should":[ {'match' : {'Facility Type': {"query" : 'Daycare (2 - 6 Years)',"operator":"and"}}},
{'match' : {'Facility Type':{"query" : 'Daycare Above and Under 2 Years',"operator": "and"}}},
{'match' : {'Facility Type':{"query" : 'CHILDRENS SERVICES FACILITY',"operator" : "and"}}},
],
"minimum_should_match" : 1,
"filter" : [
{"match" : {'Results': {"query": 'Fail', "operator": "and"}}},
{"match" : {'Risk': {"query": 'Risk 1 (High)', "operator": "and"}}},
{"match_phrase" : {'Violations': {"query": "MICE DROPPINGS"}}}
# {"match_phrase" : {'Violations': {"query": "<QUERY STRING 2>"}}}
],
# "must_not":{"match_phrase": {'Violations': 'NO MICE DROPPINGS'}}
}
}
}
results = es.search(index='food_inspections', body=query,scroll='1h')
# results
#we can use the code from the previous experiments even though we don't need the "count" in this case only the lat and long
sid = results['_scroll_id']
scroll_size = results['hits']['total']
print('sid = ', sid)
print('Scroll Size = ', scroll_size)
sid = DnF1ZXJ5VGhlbkZldGNoCgAAAAABTkqsFmxSTjR3TFBsUXotQ2ZRQ0toYzlGQ0EAAAAAAU5KqxZsUk40d0xQbFF6LUNmUUNLaGM5RkNBAAAAAAFOSq0WbFJONHdMUGxRei1DZlFDS2hjOUZDQQAAAAABTkquFmxSTjR3TFBsUXotQ2ZRQ0toYzlGQ0EAAAAAAU5KrxZsUk40d0xQbFF6LUNmUUNLaGM5RkNBAAAAAAFOSrEWbFJONHdMUGxRei1DZlFDS2hjOUZDQQAAAAABTkq0FmxSTjR3TFBsUXotQ2ZRQ0toYzlGQ0EAAAAAAU5KsBZsUk40d0xQbFF6LUNmUUNLaGM5RkNBAAAAAAFOSrIWbFJONHdMUGxRei1DZlFDS2hjOUZDQQAAAAABTkqzFmxSTjR3TFBsUXotQ2ZRQ0toYzlGQ0E= Scroll Size = 210
count = 0
list_of_lAT_LONG_pairs = []
while(scroll_size > 0):
for inspection in results['hits']['hits']: #Iterating each results of the qurey
current_location_lAT_LONG = []
document = inspection['_source']
count = count +1
#defensive coding to ensure we have the fields in the inspection documents
if 'Latitude' in document.keys():
if 'Longitude' in document.keys():
if 'Address' in document.keys():
if(document['Latitude'] != None and document['Longitude'] != None and document['Address'] != None):
current_location_lAT_LONG.append(float(document['Latitude'])) #Appending Latitude and Longitude into the list
current_location_lAT_LONG.append(float(document['Longitude']))
list_of_lAT_LONG_pairs.append(current_location_lAT_LONG)
results = es.scroll(scroll_id = sid, scroll = '2m')
sid = results['_scroll_id'] #Changing the scroll-id
scroll_size = len(results['hits']['hits'])
print("the total number of matches for requirement 5: ",count)
the total number of matches for requirement 5: 210
#add lat and long to map
chicago_map = folium.Map([41.90293279, -87.70769386], zoom_start=11)
chicago_map.add_child(plugins.HeatMap(list_of_lAT_LONG_pairs, radius=15))
chicago_map
#include/exclude query attempt
query = {
'size' : 10000,
'query': {
'bool': {
'must' : [{'match' : {'Results': 'Fail'}}, {"match" : {'Risk': {"query": 'Risk 1 (High)', "operator": "and"}} }, # same as where clasue in SQL
{"query_string": {
"query": "Children",
"fuzziness": "1",
"fields": ["Facility Type","Violations","DBA Name"]
}
}
]
}
},
'aggs':{
'tags':{
'terms':{
'field':'Violations.keyword',
'include':'MICE DROPPINGS',
'exclude':'NO MICE DROPPINGS'
}
}
}
}
results = es.search(index='food_inspections', body=query,scroll='1h')